Voting on N-grams for Machine Translation System Combination
نویسندگان
چکیده
System combination exploits differences between machine translation systems to form a combined translation from several system outputs. Core to this process are features that reward n-gram matches between a candidate combination and each system output. Systems differ in performance at the n-gram level despite similar overall scores. We therefore advocate a new feature formulation: for each system and each small n, a feature counts n-gram matches between the system and candidate. We show post-evaluation improvement of 6.67 BLEU over the best system on NIST MT09 Arabic-English test data. Compared to a baseline system combination scheme from WMT 2009, we show improvement in the range of 1 BLEU point.
منابع مشابه
Using N-gram based Features for Machine Translation System Combination
Conventional confusion network based system combination for machine translation (MT) heavily relies on features that are based on the measure of agreement of words in different translation hypotheses. This paper presents two new features that consider agreement of n-grams in different hypotheses to improve the performance of system combination. The first one is based on a sentence specific onli...
متن کاملStatistical Machine Translation of Euparl Data by using Bilingual N-grams
This work discusses translation results for the four Euparl data sets which were made available for the shared task “Exploiting Parallel Texts for Statistical Machine Translation”. All results presented were generated by using a statistical machine translation system which implements a log-linear combination of feature functions along with a bilingual n-gram translation model.
متن کاملSystem Combination for Machine Translation Using N-Gram Posterior Probabilities
This paper proposes using n-gram posterior probabilities, which are estimated over translation hypotheses from multiple machine translation (MT) systems, to improve the performance of the system combination. Two ways using n-gram posteriors in confusion network decoding are presented. The first way is based on n-gram posterior language model per source sentence, and the second, called n-gram se...
متن کاملExploiting N-best Hypotheses for SMT Self-Enhancement
Word and n-gram posterior probabilities estimated on N-best hypotheses have been used to improve the performance of statistical machine translation (SMT) in a rescoring framework. In this paper, we extend the idea to estimate the posterior probabilities on N-best hypotheses for translation phrase-pairs, target language n-grams, and source word reorderings. The SMT system is self-enhanced with t...
متن کاملrgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output
We describe F, a tool for automatic evaluation of machine translation output based on ngram precision and recall. The tool calculates the F-score averaged on all n-grams of an arbitrary set of distinct units such as words, morphemes, tags, etc. The arithmetic mean is used for n-gram averaging. As input, the tool requires reference translation(s) and hypothesis, both containing the same c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010